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Use of Geochemistry Data Collected by the Mars Exploration Rover 
Spirit in Gusev Crater to Teach Geomorphic Zonation through 
Principal Components Analysis 
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ABSTRACT 

This paper presents a laboratory exercise used to teach principal components analysis (PCA) as a means of surface zona¬ 
tion. The lab was built around abundance data for 16 oxides and elements collected by the Mars Exploration Rover Spirit 
in Gusev Crater between Sol 14 and Sol 470. Students used PCA to reduce 15 of these into 3 components, which, after 
quartimax rotation, very strikingly divided the surface traversed by Spirit's into three distinct zones. Students then used 
such concepts as the Bowen reaction series, typical minerals in Earth's basalts and andesitic arcs, the periodic table, and 
the Goldschmidt classification, together with Pancam images from Spirit and the Mars Orbiter Camera, to interpret the 
surfaces over which the rover moved. Students found this foray to Mars a challenging but enjoyable project, and it made 
PCA memorable to them long after the class had ended. Some variant on this lab could work for multivariate statistics 
courses in geology, geography, and environmental science, as well as advanced courses in the content of those disciplines, 
particularly those dealing with zonation. ©2011 National Association of Geoscience Teachers. [DOI: 10.5408/1.3604826] 


INTRODUCTION 

This paper presents an exercise that uses Mars Explora¬ 
tion Rover geochemical data to teach principal components 
analysis (PCA) for geomorphic or geological zonation. The 
data come from the Spirit rover's Alpha Particle X-ray Spec¬ 
trometer (APXS), which collected spectra from 93 rocks and 
soil samples (Gellert et al, 2006) during its travel over three 
distinctive zones on the floor of Gusev Crater. These zones 
consisted of a cratered basaltic plain, the West Spur of the 
Columbia Hills with bedded materials and evaporites, and 
the northwest side of Husband Hill where very diverse 
aqueous and acid-aqueous altered rocks and soils were 
found. PCA is a data reduction technique that has increas¬ 
ingly been used in the geosciences since the early 1960s, 
making its acquaintance of value in the education of geosci¬ 
ence majors. The APXS data can make the technique memo¬ 
rable to such majors as it produces a coherent zonation 
from 15 different oxides and elements. 

A classic task in the geosciences is zonation of complex 
surface patterns into areal units and demarcating transition 
zones or boundaries between them, often along a transect 
in the field. So, for example, a soil catena can be zoned by 
changes in soil particle size, underlying bedrock and rego- 
lith, topographic relief, drainage, erosion and deposition 
processes, weathering, organic matter, and geochemistry 
(Milne 1935; Bushnell, 1942; Webster, 1973; Raynolds et al., 
2006). Ground-penetrating radar can be used along a tran¬ 
sect to infer subsurface stratigraphy for geological map¬ 
ping (Baker and Jol, 2007). An environmental ecotone 
might be zoned by field sampling of soils and censusing of 
species presence and abundance along a transect. For 
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example, a transect could be taken down a catena, across a 
wetland-upland interface, or through a seasonal surface 
water and groundwater boundary (Fortin et al., 2000). 

Zonation can be vertical and temporal in geological 
usage, not just horizontal and spatial in mapping usage. 
So, for example, fossils, grain size, bulk density, and geo¬ 
chemistry can be used for temporal zonation and sequenc¬ 
ing of stratigraphic units (e.g., Patterson et al., 2000; Brown 
and Pasternack, 2004; Peterson et al., 2008). 

Zonation, then, is a common task in the field and labo¬ 
ratory activities of geoscientists. The process can seem 
superficially straightforward, but the zoning schemes that 
result can color analytic results. Complications include 
scale, edge effects, spatial autocorrelation, and aggregation 
effects. These distortions and biases are collectively called 
the Modifiable Areal Unit Problem or MAUP (Dark and 
Bram, 2007) or the analogous Modifiable Temporal Unit 
Problem (MTUP). The MTUP is less commonly discussed, 
largely in criminology contexts (e.g., Taylor, 2010), but it is 
clearly relevant to geoscientists' work. Concern about zon¬ 
ing has driven development of statistical techniques to let 
image processing, GIS, and statistical software handle the 
kinds of remote sensing, field, and laboratory data gener¬ 
ated and used by geoscientists. 

One of the common statistical techniques used in zona¬ 
tion is principal components analysis or PCA, a member of 
the factor analytic (FA) family of techniques. PCA is pri¬ 
marily concerned with data reduction or grouping of many 
variables into fewer components. FA is mainly concerned 
with identifying or testing underlying factors that may not 
be directly measurable themselves but which are expressed 
in commonalities in measurements of empirical variables 
(Bryant and Yarnold, 1995; Rogerson, 2006; Davis, 2002). 
PCA is more empirical and inductive; FA is more theoreti¬ 
cal and in some versions can test deductive hypotheses 
about expected underlying factor structure. 

In the geosciences, PCA/FA is a common method 
underlying the unsupervised classification of remote 
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sensing imagery. It can also be perfonned on large data sets 
collected in the field. These could include sampling down 
through a geological column, sediment core, or ice core, for 
example, or to process observations across space, as is the 
case in the laboratory exercise presented here. It can, thus, 
assist in both temporal and spatial zonation, making it a 
tool of increasing utility to a variety of geoscientists. For 
that reason, PCA/FA, particularly PCA, is increasingly 
encountered in the geoscience literature since its discipli¬ 
nary debut in the early 1960s (e.g., Reyment, 1961; Wong, 
1963; Imbrie and Van Andel, 1964). For that reason, practice 
in its application would enhance the professional prepara¬ 
tion of geoscience students, especially at the advanced 
undergraduate and beginning graduate levels. 

For all its usefulness, PCA/FA is not the most "user 
friendly" approach for students. The mathematical com¬ 
plexities are now easily handled by the common statistical 
software packages. These include SPSS, Statetica, Minitab, 
Matlab, Scilab, R, the freeware programs PAST and WinI- 
DAMS, and others, Their widespread availability makes 
PCA/FA accessible to undergraduate students. The under¬ 
lying concepts, however, are difficult to convey, because 
PCA/FA represents the many variables in the analysis as 
dimensions and the data collected as occupying an n- 
dimensional data cloud. Trying to "visualize" this is a tough 
sell to students! The point of PCA, especially, is to reduce 
the dimensionality of the data cloud to a small number of 
usually orthogonal components. These can then be pro¬ 
jected through the data cloud and aligned with most of the 
data points when "viewed" through various "rotations" of 
the emergent model. Each of the original variables will 
"load" highly along one of the components (some may load 
less dramatically on more than one component). That is, 
most of the original variables will show strong correlations 
with one of the artificial variables, or components. The 
result for geoscientists can be a meaningful zonation of time 
or space. PCA zonation is generated from the data them¬ 
selves, rather than from a priori schemes that can give rise 
to the MAUP and MTUP. The only way to motivate stu¬ 
dents to acquire this tool is to show it in operation on a data 
set that otherwise would overwhelm them but, processed in 
PCA, becomes intelligible to them. 

Anyone who has ever taught a statistics course knows 
that the worst part is finding or creating a data set that 
meets the requirements of a given technique, produces 
results that can enable teachable moments, and, ideally, 
has something to do with the discipline in which the statis¬ 
tics course is taught. This paper introduces a geoscience- 
related data set, discusses how it was shaped for classroom 
use, shows the results of a PCA taught through its use, and 
then evaluates student outcomes. The exercise vividly 
models the utility of PCA for geochemical data reduction 
and geomorphic zonation. 

The database contains 16 oxide and element abundan¬ 
ces collected from untouched, brushed, or abraded rocks, 
which were selected by the Mars Exploration Rover science 
team for APXS during Spirit's traverse in Gusev Crater. The 
lab exercise, using SPSS and Excel, is available at <http:/ / 
www.csulb.edu/ ~rodrigue/ geog400/project5.html>. 

DATA AND METHODS 

The data were originally published in Gellert et ai, 
2006, where they are presented in the second table of the 


article. This table can be saved as a tab-delimited file for 
import into a spreadsheet program. There, the data can be 
further edited to fit the needs of a statistical software pro¬ 
gram or an instructor's goals. 

This table has as the record labels the "sol" or the mar¬ 
tian day after landing, on which the sample record was 
taken. The table covers the first 470 sols of Spirit's activ¬ 
ities. Martian sols are slightly longer than Earth days at 24 
h, 37 min, and 23 s, and the date of landing for Spirit was 4 
January 2004 on Earth. The second variable is the type of 
surface from which Spirit's APXS took spectra on a given 
sol. These include rock undisturbed by Spirit (RU), rock 
brushed off (RB), rock "RATted" or abraded by the rock 
abrasion tool (RAT), RAT fines or abrasion debris (RF), soil 
undisturbed (SU), soil disturbed (SD), and soil trenched 
(ST). The third variable is the sometimes whimsical nick¬ 
name given to an individual rock or soil surface. Norm or 
geometric norm is a relative measure of the standoff dis¬ 
tance between the APXS and the sample surface in milli¬ 
meters. This distance affects how much background 
elemental noise is included in a reading. The variable nor¬ 
malizes the sum of all oxides to 100% to allow measure¬ 
ment of relative contributions by each oxide or element. T 
is the time in hours that the instrument took to integrate 
the spectrum. Following these three columns are two col¬ 
umns for each of the oxides and elements. The first gives 
the relative abundances of the 12 oxides (wt. %) and 4 ele¬ 
ments (parts per million), and the other reports the statisti¬ 
cal error bounds set at ±2 standard deviations. The table, 
then, has 37 columns and 93 rows of records. 

The use of PCA on these data offers opportunities to 
encourage students' critical thinking about examples of 
PCA they encounter in the literature or their own future 
work. The data set presented here conforms to some but 
not all of the assumptions for the proper use of the tech¬ 
nique, and students should be able to identify these depar¬ 
tures and conclude that their results will be tentative. For 
example, the number of records is below the 100 usually 
recommended as a minimum sample size for PCA. 

The variables used should, ideally, be roughly normal 
in the distribution of their values, though PCA does not 
depend on normality in all variables as a critical assump¬ 
tion. Students should get in the habit of assessing distribu¬ 
tions, though. One way is to construct histograms of each 
of the 15 variables for visual inspection of their distribu¬ 
tions. Alternatively, they can compare each variable's 
mean value to its median value and then calculate Pear¬ 
son's skewness measure: Sk= [3(Mn - Md)]/s, where Sk is 
Pearson's Skew, Mn is the mean, Md is the median, and s 
is the standard deviation for the sample. Sk > |0.2| can be 
considered skewed, the direction of the outliers given by 
the sign of the statistic. Alternatively, statistics packages 
commonly include tests for non-normality, such as the Sha- 
piro-Wilk W, the Kolmogorov-Smirnov, or the D'Agos- 
tino-Pearson omnibus test. However students evaluate 
normality, some of the variables are approximately normal 
in distribution, but some will emerge as non-normal, and 
bromine is markedly right-skewed. 

Other assumptions of PCA are fully met. The measure¬ 
ment level for all variables entered into PCA must be sca¬ 
lar, whether interval or ratio, and these are. Having 
students check on this will help reinforce their sometimes 
shaky grip on the concept of measurement levels (nominal, 
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ordinal, scalar). The subjects-to-variables ratio (SVR), or 
the ratio of records to columns, should be at least 5. With 
93 records and 15 variables (the dropping of zinc is dis¬ 
cussed below), this data set provides an SVR of 6.2 (leaving 
zinc in gives an SVR of 5.8). 

Given that the purpose of doing the PCA here is for 
zonation of the crater floor surface by oxide and element 
composition, many of the columns may be dispensed with 
for the exercise. This leaves only those columns with iden¬ 
tifiers and the oxide and element abundances. The result is 
a 93 record by 16 column (1 identifier and 15 oxides and 
elements) spreadsheet. The identifier should be sol. 

Instructors can import the resulting spreadsheet into a 
statistical package at this point and run the PCA several 
times to become thoroughly familiar with the package's 
PCA defaults and options and the effects they have on the 
outcomes. The defaults on SPSS, for example, will produce 
four components that will be very difficult for students to 
interpret. The fourth component only has zinc as the single 
high loading variable on it. Some options at this point 
might be forcing the software to meet a higher cutoff value 
to retain a component or specifying that only three compo¬ 
nents are desired. This entails more lecture and demonstra¬ 
tion work to get students to modify the PCAs and to 
understand the modifications, when they are struggling 
just to grasp the peculiar PCA hyperspace to begin with. 

Alternatively, the zinc column can be omitted, which 
leads to a simple three component solution using the com¬ 
mon PCA defaults. This is ideal for demonstration pur¬ 
poses and for the subsequent student work needed to 
interpret the outcome and, so, I recommend sacrificing the 
zinc data for the pedagogical goals of the lab. The discus¬ 
sion below uses the 15 oxides and elements version of the 
spreadsheet, which may be accessed at <http:// 
www.csulb.edu/ ~rodrigue/ geog400/gusevminimal.xls >. 

RESULTS 

Students should be guided through the process that 
the statistics software uses to generate the components. It 
is important to have the software save the components as 
regression variables, which will be appended as new col¬ 
umns in the data display matrix. These three new columns 
should then be copied to the original spreadsheet for 
graphing (both Excel and OpenOffice/Libre Office Calc 
will work satisfactorily). 

Eigenvalues 

An important part of the output is the total variance 
explained, showing the eigenvalues for each eigenvector or 
principal component. The sum of eigenvalues equals the 
number of original variables, but the percentage of total 
variation in the data cloud explained by each additional 
component declines sharply. This produces a progressively 
smaller gain in cumulative variance explained with each 
new component extracted. At some point, the marginal 
gain in cumulative explanation becomes insignificant. The 
eigenvalue for each component or the percentage of 
explained variance for each component can be graphed 
against component number in an X-Y plot. This graph is 
commonly referred to as a "scree plot," in a refreshingly 
geoscientific turn of phrase! The scree graph can identify 
the number of useful components visually by the nick 


point between the steeper part of the slope and the flatter 
part. The software package will default to an eigenvalue of 
1.0, ceasing to extract new components with eigenvalues 
smaller than that, which accords well with visual examina¬ 
tion of the scree plot. 

Component Matrix and Rotation 

Another critical part of the output is the component 
matrix, which shows the loadings of each of the original 
variables onto each of the extracted artificial variables or 
components. The first component will show high positive 
or negative loadings for several variables, and only a very 
few will be close to 0. The second component will also 
show that pattern of high positive and negative loadings. 
The high loading variables, however, are typically varia¬ 
bles that had very low loadings on the first component. 
There are often fewer high loaders on the second compo¬ 
nent than on the first, as there is less variance to account 
for after the first component "soaked" up a substantial 
amount of it. Also, the highest loadings on the second com¬ 
ponent may well be lower than the highest loadings on the 
first component, though this is not always the case. The 
pattern continues into the third component, with fewer 
and fewer high loading variables and often, though not 
always, lower maximum loadings. 

The original raw component matrix will show this pat¬ 
tern as described, but it is very common for the range of 
loadings to be small enough to make it hard for students to 
judge which of the variables are "high" loading versus 
"low" loading. To make the picture crisper, it is possible to 
rotate the model or, more accurately, rotate the vantage 
point from which the model is "viewed." The goal here is 
to figure out the polarities represented by the components 
and, in some fields, it is common practice to come up with 
evocative names for these artificial variables, though this is 
less commonly done in the geosciences. 

For PCA, the two most common rotations are varimax 
and quartimax. Varimax rotates the component matrix so 
as to drive some of the variable absolute loadings within a 
component column higher at the expense of driving other 
variable loadings closer to 0 on that component column. 
This exaggerates the range of absolute values down the 
column. Quartimax does the same sort of thing, but it exag¬ 
gerates differences along the variable rows, helping assign 
variables more readily to components. This seems the 
more helpful with this particular data set, so I would en¬ 
courage the reader to have students perform a quartimax 
rotation and concentrate on the resulting rotated compo¬ 
nent matrix. Varimax will work nicely enough, though. If 
that is the only rotation method provided by the software, 
an instructor can be confident that students will still be 
able to interpret their results well with that rotation sys¬ 
tem, too. 

Something I have found which helps students (and 
myself) interpret a component matrix is to use a high¬ 
lighter on the printout to mark the highest loading (on 
component 1, 2, or 3) for each variable. At this point, they 
can apply their geoscience background to figure out associ¬ 
ations among variables loading highly positively on each 
component and among those other variables loading 
highly negatively on each component. Table I presents the 
Quartimax rotated component matrix generated by SPSS 
with these data, with high loadings bolded. 
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TABLE I: Rotated component matrix. 



Component 

Variable 

1 

2 

3 

Na z O 

0.772 

0.221 

0.362 

MgO 

-0.629 

-0.573 

0.090 

AI2O3 

0.769 

0.306 

0.493 

Si0 2 

0.169 

-0.011 

0.965 

P2O5 

0.710 

0.195 

-0.548 

S0 3 

-0.157 

-0.140 

-0.920 

Cl 

0.101 

-0.870 

0.019 

k 2 o 

0.751 

0.029 

0.013 

CaO 

0.196 

0.885 

-0.073 

Ti0 2 

0.866 

0.203 

-0.017 

Cr 2 03 

-0.830 

0.385 

0.121 

MnO 

-0.642 

0.571 

-0.020 

FeO 

-0.850 

0.162 

-0.179 

Br 

0.099 

-0.648 

-0.232 

Ni 

-0.313 

-0.675 

-0.009 


Note: Extraction Method: Principal component analysis, rotation method: 
Quartimax with Kaiser normalization highest component loading for each 
variable in bold 


Identifying and Understanding the Extracted 
Components 

Why might potassium oxide and alumina cluster to¬ 
gether with high positive loadings on component 1, for 
example (Table I)? Why, alternatively, might magnesium 
oxide and ferrous oxide also be packaged together on com¬ 
ponent 1, but with very high negative loadings? What is 
the dichotomy being picked up by component 1? Among 
the resources I gave students for sorting this out was the 
annotated rock composition chart at <http: / /www.csul- 
b.edu/ ~rodrigue/geog400/rockcompositions.jpg>. 

The two halogens come out with high negative load¬ 
ings on component 2, while calcium oxide pops up with 
high positive loadings on that component (Table I). 
Resources to help students interpret that component 
would include the periodic table and, for the calcium issue, 
the rock composition chart. Nickel loads positively along 
with the halogens, which can be used for a side discussion 
about what might put the highly siderophilic nickel on the 
surface of a planet. 

On the third component, only two variables load very 
highly, silica in the positive direction and sulfur trioxide in 
the negative direction (Table I). I point students back to the 
rock composition chart. Additionally, I have students look 
up the Goldschmidt classification of the periodic table into 
siderophilic, chalcophilic, lithophilic, and atmophilic ele¬ 
ments. A discussion about sulfate chemistry in water might 
be helpful, too, as silica can be freed from mafic materials 
by small amounts of strongly sulphur-acidified cold water 
moving through them. Once liberated by acid-water alter¬ 
ation of basalt, the silica can then be precipitated by evapo¬ 
ration (Me Adam et al., 2008). That may be why silica and 
sulfur trioxide are linked on the third component. 

This analysis of the polarities among variables loading 
onto each of the three components is the most challenging 


part of the lab for students. It requires them to excavate 
and apply their basic geoscience background to figure out 
the pattern in the statistics. On the first component, they 
should suggest the mantle-crust or mafic-felsic dichotomy 
or the Bowen reaction series. On the second component, 
they might come up with an aqueous versus nonaqueous 
or evaporite versus nonevaporite theme. On the third com¬ 
ponent, students might propose the chalcophilic-litho- 
philic dichotomy, mantle-crust division, or acidic 
alteration of basalt. 

Geovisualization and Zonation 

Once students have some idea what the three compo¬ 
nents might mean, they can graph the departures of each 
component from neutral by sol, or across space. This is 
most easily accomplished in a spreadsheet, so have the stu¬ 
dents copy the three columns for PCI, PC2, and PC3 into 
their original spreadsheet. 

I have students make a line chart of the abundance of 
each oxide or element by sol. This can be done 15 times or, 
to reduce tedium, a few line charts can be constructed with 
several variables on each chart. For example, two could be 
created from the variables with high positive scores and 
with high negative scores on PCI, while another two could 
show those with high loadings in either direction on PC2 
and on PC3. Examination of these many line charts will 
prove intentionally frustrating to the students, as no real 
pattern readily emerges, and it can be hard to pick out sim¬ 
ilarities between any pair of oxides or elements. A spread¬ 
sheet containing the original data, the component scores, 
and several XY graphs are available in Excel 97/2000/XP 
format at <http://www.csulb.edu/~rodrigue/geog400/ 
GusevChemJGE.xls >. 

At this point, I have students make one chart with the 
three lines corresponding just to the component scores, 
instead of the variable values (Fig. 1). Students highlight 
the sol column and, holding down the control key, sepa¬ 
rately tap each of the three component columns in turn as 
well. The resulting line chart will be pretty messy, but stu¬ 
dents can clean it up by formatting the X axis sol labels to 
run vertically and the Y axis to have 0 or 1 decimal places 
in order to declutter it. 

Have the students pay close attention to the first part 
of Spirit's traverse and identify which component is 
diverging the most strongly upward most of the time and 
which other component is diverging the most strongly 
downward. PC2 dominates in the positive direction and 
PCI in the negative direction, while PC 3 stays fairly close 
to 0. As their eyes move to the right, they will notice that a 
different pair of components diverges most strongly. This 
time PC2 diverges very strongly below neutral and PC3, 
most of the time, diverges somewhat above neutral, while 
PCI, most of the time, stays closest to neutral. At the right¬ 
most part of the graph, things change quite drastically, 
with PC3 diverging very unstably and often with extreme 
values below neutral. PCI shows a similarly spiky positive 
dominance of most of this area, with PC2, mostly, clinging 
to neutral. Thus, three zones have been identified by PCA. 
Students can use the line-draw function (or just a pencil) 
to sink vertical lines marking the points on the graph 
where the components shift their positive and negative 
dominance patterns. They should note the sols on which 
these switches take place (roughly sol 158 and sol 315): 
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Factor Scores 



Sol after Spirit's landing 


Source: Gelled et al. (2006), Table 2 
PC A: C.M. Rodrigue (2010) 


FIGURE 1: PCA factor scores for MER Spirit APXS oxides and elements. 


These mark the boundaries among the three zones, or the 
sols on which Spirit crossed onto a different kind of terrain. 
Students are generally pretty impressed by how readily 
the landscape is zoned, especially if they had struggled to 
make heads or tails of the individual oxide and element 
line charts. Now, they can compare this zonation visually 
with the landscape of Gusev Crater. 

Turning to a map of Spirit's traverse <http://marsro- 
vers.jpl.nasa.gov/ mission/ tm-spirit/images/MERA_A1457 
_2_br2.jpg>, as well as a labeled Spirit Pancam image 
<http: / / marsrover.nasa.gov/ mission/ tm-spirit/images/sol 
_572_in_soll49Pan.jpg>, students should find the two 
dates marking the boundaries of the three zones. They will 
find that the first zone is spatially by far the most exten¬ 
sive, a long, almost straight shot across a cratered basaltic 
plain. The second zone is the short curving segment 
around the westernmost spur of the Columbia Hills, a ter¬ 
ritory featuring the bedded rocks that the MER team had 
originally hoped to find when selecting Gusev Crater for 
Spirit's landing. The third zone is the ascent into the Co¬ 
lumbia Hills, where there proved to be a great diversity of 
rock and soil types and team interest sent the rover to 
explore this diversity, leading to the very spiky pattern in 
the third zone. This third zone, then, foregrounds the 
team's interests perhaps even more than the tenor of the 
terrain itself. 

Depending on time available, faculty can have stu¬ 
dents pick out finer scale features, too. Students should 
note the sols at which components may switch "polarity" 
or magnitude of scores for brief spells within the three 
zones and then compare those sols on the traverse map 
with labeled features. In the first zone, for example, stu¬ 
dents easily spot the signals of Spirit crossing onto Bonne¬ 
ville Crater's ejecta blanket, then its movement about the 
rim, and then its descent down the blanket toward Mis¬ 
soula Crater. The ejecta blanket surface produces specially 
marked and persistent divergences in PCI and PC2, where 
the impact excavated and ejected deeper basaltic materials. 

In the following section, the surface characteristics of 
the three zones emerging from PCA are discussed. The first 
zone consists of cratered basaltic plains. The second one of¬ 
ten features bedded materials evidencing evaporites. The 


third zone is a complex mix of diverse materials suggestive 
of acid-aqueous alteration. The discussion section also 
includes consideration of finer scale subzones in each of 
the three major zones and then finishes with a discussion 
of processes creating the three main zones. 

DISCUSSION OF THE ZONATION PRODUCED 
WITH PCA IN GUSEV CRATER 

Statistical results and graphs need to be interpreted 
within the concepts of the disciplines generating them. 
These are challenging enough in this case to require a fair 
amount of unobtrusive faculty facilitation for students to 
understand. Faculty in geoscience disciplines, for the most 
part, work on Earth, and Mars is peripheral to their normal 
activities. There are many excellent books and other 
resources to become more comfortable with Mars, but a 
comprehensive work on the martian surface that includes 
the new data from the MER rovers is Carr (2006). 

In terms of statistical misunderstandings, it is easy for 
students to interpret negative component scores as "low" 
scores and positive component scores as "high" scores, for 
example. It is important to get across that principal compo¬ 
nents are rather like see-saws, with, in this case, different 
oxides and elements "seated" on opposite ends of each 
component. When the positive side swings up strongly 
with highly positive component scores, so do the chemicals 
seated on that side (those with positive loadings on the 
component). Similarly, the negative side can also swing up 
into high (negative) component scores, lifting the chemicals 
with negative loadings into view. Understanding which 
oxides and elements are "lifted into the air" (diverge from 
the neutral 0 component score line in either direction) is 
important for figuring out the nature of the surface. With 
these precautions. Fig. 2 shows Spirit's transect divided 
into the three zones created by PCA. 

Zone 1: Cratered Basaltic Plain 

So, in the first zone of Spirit's transect, PC 1 diverges 
strongly in the negative direction. This calls attention to the 
dominance of ferrous oxide, magnesium oxide, manganese 
oxide, and chromium sesquioxide. These oxides indicate 
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FIGURE 2: Spirit traverse map showing median component scores and PCA-derived zonation of the traverse from 
sols 14 through 470. 


olivines and pyroxenes and other minerals associated with 
basalts and the highest temperatures along the reactive 
branch of the Bowen reaction series. This component, 
shifted so far negatively, hints at the lack of aqueous or 
acid-aqueous alteration along the olivine-to-feldspar join in 
A-CNK-FM compositional space (Nesbitt and Young, 1989). 
It also expresses the aeolian deposition of thin coatings of 
iron oxides on rock and regolith surfaces. These oxides were 
liberated from basalts by the action of atmospheric oxidants, 
such as hydrogen peroxide. Then, they have been carried 
around the planet by winds to the point of near homogene¬ 
ity of global dust composition (Yen et al, 2005). 

PC2, meanwhile, diverges very strongly in the positive 
direction in this zone, carrying calcium oxide into promi¬ 
nence. Since calcium is common in basalts and calcium pla- 
gioclase dominates the highest temperatures in the 
nonreactive arm of the Bowen reaction series, the upward 
swing of PC2 is not surprising. It reinforces the impression 
of a basalt and basaltic regolith landscape dominated by 
oxides of siderophilic and lithophilic elements. This is the 
same signal picked up by the negative swing in PCI. 

Zone 2: Evaporites 

The second zone, crossed into by Spirit around sol 158 
as it began its exploration of the West Spur of the Columbia 
Hills, sees PC2 swing strongly into the negative direction. 
This carries into prominence the three elements that loaded 
strongly onto the negative end of PC2: chlorine, bromine, 
and nickel. Nickel is associated with certain meteorites, so 
its presence on any martian surface is not surprising. 

Chlorine and bromine, however, are markers of evapo¬ 
rative concentration. They were often found within cracks 
and voids in rocks analyzed by the APXS, starting in the 
latter part of zone 1 and then very prominently in zone 2 


(Erickson et al., 2005). These two halogens, then, constitute 
a hint of water or groundwater. This hint counters the 
impression of basalts highlighted in the lab's PCA trends 
back in zone 1. Mossbauer spectroscopy on the Spirit rover 
supports the PCA identification of a mafic surface there. 
This instrument detects minerals and identified an abun¬ 
dance of unaltered or very weakly altered olivine along 
Gusev's transect in zone 1 (Morris et al., 2006). Olivine has 
a strong proclivity for rapid alteration in the presence of 
water, so its prevalence in the first zone suggests dryness 
after the basalt flow event. With the two halogens made 
prominent by the negative deviation of PC2, then, this sec¬ 
ond zone evidences the presence of small amounts of water 
in the older materials outcropping above the basalt surface. 
These were probably in the form of groundwater or frost 
deposition and subsequent aqueous alteration of regolith. 

The component most frequently diverging in the posi¬ 
tive direction, though not too strikingly, is PC3. The only 
chemical to load strongly positively on PC3 is silica. On 
Earth, silica can derive from magma fractionation in the 
crust or from alteration of basaltic materials through sul¬ 
fate geochemistry. Mars had a great deal of sulfur pumped 
into its atmosphere by volcanic activity from ~4.2 billion 
years ago (Ga) to ~3.8 Ga. This was copious enough to 
produce geochemical cycles dominated by very acidic sul¬ 
fate chemistry. So, these silicas may reflect the "Theiikian" 
or sulfate era (Bibring et al. 2006; McAdam et al., 2008). 

In this region, PCI occasionally surpasses PC3 in posi¬ 
tive deviation, carrying a signal of the oxides of aluminum, 
sodium, potassium, and titanium. These similarly concen¬ 
trate by fractionation but in such minerals as orthoclase 
and sodium feldspar. The Spirit team noted that the rock 
materials in zone 2 were softer for the RAT to cut into 
(Erickson et al., 2005). They commented on a trend of 
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increasing detection of small amounts of water alteration 
in the cratered basaltic plains along the long straight trajec¬ 
tory occupying the rover until sol 158. They note that some 
of the rock appears layered after sol 158, comprising a mix 
of fine and massive beds, each of which shows relatively 
poor size sorting and includes some large grain sizes. This 
suggests deposition in a high energy environment, such as 
impact gardening, with subsequent alteration and soften¬ 
ing by more water than is evidenced on the basaltic floor 
of Gusev. This, no doubt, accounts for the change in the 
polarity and magnitudes of the three principal components 
marking the transition from zone 1 to zone 2. 

Zone 3: Diversity in Aqueous Sulfate Geochemistry 

As Spirit began to climb the northwestern slope of 
Husband Hill in the Columbia Hills after sol 315, the land¬ 
scape took on a third character geochemically as well as 
topographically. In this zone, PC3 swings negatively, in a 
couple of cases quite spectacularly so. Sulfur trioxide is the 
chemical with a strong negative loading on PC3, bringing 
up sulfur chemistry again. Sulfate itself (S04) is not part of 
the database derived from Gellert et al. (2006), but Erickson 
et al. (2005) comment that sulfate was abundant in the indi¬ 
vidual rocks and soils. This corresponds to the sharp nega¬ 
tive deviation in PC3 seen in this lab. Along with the S0 3 
highlighted by PC3's negative deviation, the sulfates men¬ 
tioned by Erickson et al. suggest an aqueous chemistry, the 
kind associated with the acidic waters produced by sulfate 
geochemistry (McAdam et al., 2008). Reinforcing the 
impression of sulfate chemistry in the third zone are the 
half dozen samples in which PCI drops sharply into nega¬ 
tive scores. Erickson et al. (2005) describe these as basaltic 
grains cemented by magnesium sulfate salts (the "Peace" 
and "Alligator" rocks). 

It is PCI, however, that diverges strongly positively in 
most of the third zone, foregrounding the oxides of tita¬ 
nium, sodium, aluminum, potassium, and phosphorous. 
These are often seen in the granites and rhyolites (quartz 
and the potassium and sodium feldspars) that result from 
the final fractionation of magmas in the Bowen reaction se¬ 
ries, but Mars is not noted for strong magma fractionation. 
Gellert et al .'s (2006) paper suggests instead that water 
acidified by sulfates and chlorine tends not to leach feld¬ 
spars with any efficiency. This may account for their pres¬ 
ence or persistence in this zone as seen by the felsic oxides 
detected by the APXS, which again underscores acidic 
aqueous action. 

Finer Scale Zonation 

Instructors might opt to have students tackle finer- 
scale zonation as well. Each of the three zones shows sub¬ 
zones that depart somewhat from the overall component 
pattern in the zone. 

In zone 1, for example, the most extreme divergence of 
PCI and PC2 (roughly sols 18-63 and again sols 82-150a) 
coincides with the ejecta blankets around Bonneville and 
Missoula craters. Sols 65-81b show a convergence of all 
three components toward neutral, which coincides with Spi¬ 
rit's exploration of the rocks along Bonneville Crater's rim. 

In zone 2, students could look for areas that are 
extremely rich in halogens and carry a suggestion of silica 
(sols 197-199, 228-235, 300-304). Another subzone type 
features halogen-rich areas with felsic oxides and the acid- 


aqueous alteration they imply (sols 266-274, 291). Students 
can also spot areas close to neutral on all three compo¬ 
nents, suggesting aeolian homogenization (sols 172-178, 
227, 240-259). 

In zone 3, students can identify an area of marked 
alteration toward the oxides of elements common in felsic 
rocks, with PCI scores predominantly strongly positive 
(sols 334-357). Another area adjacent to it has strongly neg¬ 
ative PCI scores. This indicates mafic oxides, and this area 
also shows a weak halogen and sulfate signal from some¬ 
what negative PC2 and PC3 scores (sols 374-385b). Imme¬ 
diately adjacent is another area in which PCI scores return 
to strongly positive scores but with two rocks showing 
extremely negative PC3 scores. Scores on these two rocks 
indicate a very strong sulfate signal (sols 401 and 427). 
Zone 3 shows the most internal variation of all three zones. 
This reflects both greater diversity of rocks and soils in the 
Columbia Hills and the Spirit team's interests in exploring 
the extremes of diversity there. 

From Zonation to Processual Analysis 

In short, then, the 15 oxides and elements in this PCA 
lab exercise yield three principal components that produce 
a coarse but clear zonation of three different surface types 
by geochemistry. These are visually distinct on the Spirit 
traverse map. If an instructor desires, students can search 
for several finer-scaled subtypes within each zone. The 
three main zones can be turned into a meaningful geosci¬ 
ence narrative even by undergraduate students. To do so, 
they must apply their introductory general geology or 
physical geography coursework preparation, which will 
require some facilitation by their faculty. Students should 
have enough information from their previous coursework 
and the lab itself to posit a plausible history along the lines 
of Mars accreting and forming a crust, followed by a pe¬ 
riod of bombardment and impact cratering. During or after 
the bombardment, there was a possibility of fluvial deposi¬ 
tion of sediments into Gusev Crater by Ma'adim Vallis. 
With or without such deposition, there clearly was aque¬ 
ous (groundwater?) alteration of impact gardened regolith 
on the floor of Gusev. After these fluvial and/or aqueous 
alteration processes had left their marks, volcanic activity 
(perhaps from Apollinaris Patera to the north of Gusev 
Crater) covered some of these sediments with basaltic lava. 
This would have been at a time of continued strong bom¬ 
bardment, as the basalt is heavily cratered. Bombardment 
went on very heavily until ~3.7 Ga and continues at a 
drastically lower rate even today. The Columbia Hills were 
stranded as an outcrop of the older water-altered sedi¬ 
ments above the younger lava fill. After the volcanic flow 
event and after the bombardment of Mars' surface 
dwindled, the long, slow desiccation, oxidation, and aeo¬ 
lian homogenization of "modem" Mars began, veneering 
rocks and soil with iron oxide dust. 

A geological timeline must remain imprecise on Mars 
until the Mars Sample Return Lander and subsequent mis¬ 
sions can return rock materials for radiometric dating. Dat¬ 
ing of surfaces now depends on crater counting techniques 
(Hartmann, 2005) and geological reasoning from superpo¬ 
sition relationships. 

The martian timeline has been divided into three peri¬ 
ods (Barlow, 2008), named for region types. The oldest is 
the Noachian, which lasted from planetary formation until 
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~3.7 Ga. This was a period characterized by severe bom¬ 
bardment and the collapse of Mars' planetary magnetic 
field. It also featured aqueous processes, possible precipita¬ 
tion-fed surface flow and standing water, and associated 
fluvial processes. Surfaces are badly cratered but, here and 
there, dendritic networks believed to be surface water 
channels are seen. The Hesperian, debatably ending ~3 
Ga, was a time of high volcanic activity and extreme flood 
events represented by the largest outflow channels. These 
massive outflows were perhaps triggered by magma inter¬ 
action with subsurface water and ice. The most recent pe¬ 
riod is the Amazonian, characterized by desiccation, 
oxidative geochemistry, loss of most of the atmosphere, 
and, ironically, the dominance of aeolian processes. Given 
the lack of radiometric dating and lingering controversies 
over crater-counting, the boundaries among the three peri¬ 
ods are somewhat variable in the literature. 

An alternative periodization (Bibring et al., 2006) 
focuses on dominant geochemical weathering processes. 
The oldest era in this scheme is the Phyllosian, named for 
phyllosilicate clays associated with neutral to alkaline 
water. The Phyllosian concides with the early to middle 
Noachian. The second era is the Theiikian, extending from 
~4.1 to 3.5 Ga, or from the late Noachian through much of 
the Hesperian. In this era. Mars switched into a sulfate- 
dominated acidic geochemistry, possibly due to the mas¬ 
sive and pervasive volcanic activity of the late Noachian 
and Hesperian periods. The Siderikian era roughly con¬ 
cides with the Amazonian period and is characterized by 
oxidative weathering of the mafic rocks, which are so com¬ 
mon on the martian surface. This is connected with decline 
(and spatial concentration) in volcanism, loss of atmos¬ 
pheric pressure, and nearly instantaneous evaporative loss 
of any liquid water exposed at the surface. 

The zones identified by PCA in this lab can be linked 
tentatively to these timeframes. Zone 2 exposes the older 
rocks and soils evidencing acidic water, probably ground- 
water, and probably in small amounts. The prominence of 
the halogens implies evaporative concentration, a Mars al¬ 
ready beginning to lose its surface waters. This suggests 
Theiikian processes going back to the later Noachian or 
early Hesperian. Zone 1 is covered by low viscosity basal¬ 
tic flows. These flows are possibly a signal of the height¬ 
ened volcanic activity of the latest Noachian and 
Hesperian, not so young to escape serious impact pummel- 
ing but young enough to show a nearly completely dry 
Mars. Zone 3 is in many ways an extension of the acidic 
alteration seen in zone 2. Sulfate and oxidative geochemis¬ 
try is the keynote here, marking the acidic Theiikian proc¬ 
esses and transition into the Siderikian ones. The hilly 
surface is probably approximately the same age as those of 
zone 2 but with a strong aeolian signal as the oxidized dust 
of Amazonian Mars concentrated in certain sites here. 

The linkage of zones with specific periods and eras of 
martian geological time is not something that can be 
expected of students in a geoscience course in statistics. It 
could, however, be asked of advanced geoscience students 
in a planetary geoscience course that made use of this data 
set and PCA. Students in a statistics course in a geoscience 
department, however, can be expected to come up with a 
reasonable sequence of events shaping the surface of 
Gusev Crater, applying superposition relationships to the 
zones produced in the lab. 


STUDENT LEARNING OUTCOMES 
ASSESSMENT 

I have utilized this laboratory exercise in two elective 
sections of multivariate statistics during the Spring of 2008 
and the Spring of 2009, after having taught PCA with a dif¬ 
ferent data set in two earlier sections taught in the Spring 
of 2001 and the Spring of 2006. The course focuses mainly 
on multiple regression, multivariate binary logistic regres¬ 
sion, and PCA. A multivariate statistics course often does 
not generate a sufficient enrollment in the many depart¬ 
ments that want to expose their majors to such techniques. 
As a result, the geography course on our campus has been 
promoted by advisors outside of the department. I, there¬ 
fore, try to include exercises that accommodate the inter¬ 
ests of the many different kinds of students in the class 
each semester. Originally, this exercise was designed to 
pique the interest of several geology, physical geography, 
and environmental science majors in the Spring 2008 
section. 

What follows is a description of three assessments of 
the outcomes of this lab exercise, organized by the time- 
frame of impact. The first describes the immediate grade 
performance of the "Mars-enhanced" sections in compari¬ 
son with the others that learned PCA using a U.S. Census- 
derived human geographic database: the "Mars treated" 
group and the "control" group. A second phase of this 
assessment comprises a qualitative anecdote about the par¬ 
ticipation of a graduate student in a research team, in 
which I was involved, as she independently applied PCA 
to a major project stretching over a year past the end of the 
class. The third phase summarizes the responses to an e- 
mail survey I distributed to these two sections of the class 
long after the course had concluded, in order to evaluate 
the lab's long term effects. 

Student Grade Performance in Class 

The Mars treated group consisted of 32 students, the 
"control" group of 29. To assess how the Mars treated 
group compared with the control group in terms of how 
well they had learned PCA, I conducted a basic pretest and 
post-test evaluation. This entailed using scores on an intro¬ 
ductory lab given to both groups in order to establish 
whether the two student groups were statistically compa¬ 
rable. This was the pretest. The post-test compared their 
performance on the PCA lab as a diagnostic of their rela¬ 
tive success in understanding PCA. Scores on both labs 
used a 100 point scale. Intergroup differences were eval¬ 
uated with a t-test of the difference of means, with a proba¬ 
bility of <0.05 deemed significant. 

The first lab project in the course is a refresher on basic 
simple linear regression using variously transformed vari¬ 
ables. The mean scores of the two groups of students were 
89.7 and 87.5 out of 100, respectively, with standard devia¬ 
tions of 5.8 and 8.6, respectively. A t-test of the difference 
of means yielded a t score of 1.18 (prob=0.24), establishing 
that the two groups were not significantly different as the 
class began (the pretest). This finding, then, justified pro¬ 
ceeding with the post-test. The fifth lab was the PCA exer¬ 
cise. The Mars treatment group averaged 88.0, with a 
standard deviation of 9.9, while the control group earned 
82.9, with a standard deviation of 10.2. This yielded a t 
score of 2.0, which was significant (prob < 0.05). The Mars 
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treatment group did significantly better than the control 
group in demonstrating their mastery of this complex 
technique. 

One Student’s Use of PCA After Class 

A geology master's student in the Spring 2008 course 
subsequently applied PCA for her course term project. She 
did her class final project and then a follow-on project 
using PCA to analyze marine cores taken in the Santa Bar¬ 
bara Channel off the Southern California coast, going back 
~33,000 yr (calibrated). She used the deviations in the two 
components that emerged from six paleoclimate proxies to 
pick out the signals of several climate changes. These 
included the terminal Pleistocene glaciation, pre-Bolling 
warming, deglaciation, and the Holocene, as well as sev¬ 
eral smaller-scale events, such as the Younger Dryas, four 
Dansgaard-Oeschger events, the last glacial maximum, 
and three Heinrich events. She went on to present her 
work at the American Geophysical Union (Peterson et al., 
2008) and came back to my class a year later to discuss 
what she had done and inspire the next cohort of students. 
She is now in a Ph.D. program in earth science. 

Survey 1-2 Years after Class 

In the Spring of 2010, I e-mailed the 32 students who 
had taken the two sections that used this lab exercise and 
asked them eight questions about the the lab and its data 
set: 

(1) whether they felt they would be able to do a PCA 
again if they had access to appropriate software, 

(2) whether they ever had had a chance to do another 
PCA, 

(3) whether they have read about and been able to fol¬ 
low others' use of PCA, 

(4) their ranking of the three multivariate techniques 
by personal interest in them, 

(5) their ranking of the three by personal difficulty, 

(6) their ranking of the four data sets I used to teach 
the three techniques (gun crime and socioeconomic 
data, archaeological site analysis, educational 
assessment, and martian geochemistry), 

(7) their ranking of the personal difficulty of the data¬ 
bases, and 

(8) an open-ended question on whether the martian 
data made PCA harder or easier, more or less inter¬ 
esting, or more or less memorable. 

Eight students out of the 32 responded (25%). All eight 
expressed confidence that they could do PCA on their own 
using SPSS or other statistical software. Three stated that 
they had had to employ a PCA since taking the course: 
These three are graduate students, including the individual 
profiled above. They reported later using the technique in 
graduate seminars or in a conference presentation. Seven 
reported encountering PCA in an article they had had to 
read and said that their memory of the technique came 
back and gave them a better understanding of what they 
were reading. 

In terms of ranking the techniques by degree of perso¬ 
nal interest, PCA was the favorite technique of six of the 
students, with the other two listing it as least interesting. 
PCA was ranked the most difficult technique by four and 


the second most difficult by another four. No one selected 
it over multiple regression or logistic regression as easiest. 

In terms of data sets, the martian data were overall 
rated the most interesting data set. This was in comparison 
with the two data sets used for multiple regression (gun 
crime data and an educational assessment) and for logistic 
regression (archaeological site prediction). These impres¬ 
sions are quite polarized, however: five rated it the most 
interesting data set, while three rated it the least interest¬ 
ing. There was no association with either the major or con¬ 
centration of the students. In terms of the difficulty of 
working with the various data sets, though, the martian 
data were rated the hardest to handle, with three students 
rating them the hardest and another three rating it the sec¬ 
ond hardest. Again, this rating cut across all majors and 
concentrations. 

The open-ended question generated a series of adjec¬ 
tives and phrases about the use of martian data. These fell 
into four clusters. Four students described them as interest¬ 
ing or fun. Three characterized the martian data as making 
it harder for them because of the amount of information out¬ 
side their training they had to absorb to get through the lab. 
Three others said that these data made it easier for them to 
learn PCA because they were so interesting. Six of the eight 
said that the martian data made the technique much more 
memorable and gave them a better and longer-lasting 
understanding of the technique, whether or not they found 
it interesting or difficult. 

CONCLUSIONS 

The triangulation of three different forms of evidence 
encourages wider use of these martian geochemical data in 
geoscience statistics courses. First, there was significant 
improvement in student grade performance in the PCA lab 
using the martian data over the performance level of those 
using a different data set. Second, a graduate student left 
the class immediately able to apply the technique to her 
own research. Third, despite different levels of student in¬ 
terest and difficulty, students reported an enduring ability 
to understand PCA-based research. Fourth, the martian 
data made the technique memorable long after the class 
ended. 

These mutually reinforcing lines of evidence may 
make this data set appealing to other geoscience faculty 
teaching multivariate statistics or geostatistics. It is always 
a challenge to find data that can be processed with the 
technique to be taught, which produce clear results readily 
linked back to the discipline. I highly recommend wider 
use of the data provided by the Gellert et al. team (2006) for 
any educator teaching statistics in a geosciences program, 
as well as efforts to evaluate the replicability of the results 
reported here. 

Potential lines for further work include the following. 
First, other instructors who teach PCA in statistics or quan¬ 
titative methods courses could try alternating the use of 
this martian geochemistry data set with whichever data set 
they currently use. The pretest/post-test methodology 
described here would be easily implemented and could 
allow multiple pairs of Mars treatment and non-Mars treat¬ 
ment groups to be evaluated for student mastery of the 
PCA technique. Second, geoscience instructors teaching 
geomorphic or geological zonation could utilize this lab to 
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do a similar comparison of "PCA treated" and "non-PCA 
treated" student groups. They could evaluate whether ex¬ 
posure to PCA promotes a better understanding of zona¬ 
tion in comparison with other techniques, such as field 
observation, air photo interpretation, or software-mediated 
classification of remote sensing data. Construction of a 
shared assessment data clearinghouse on either of these 
topics could facilitate curricular development in geoscience 
departments. Such a clearinghouse could be made public 
through the Digital Library for Earth System Education 
(DLESE) or the Education Resources Information Center 
(ERIC). 
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